Learning compound noun semantics
نویسنده
چکیده
This thesis investigates computational approaches for analysing the semantic relations in compound nouns and other noun-noun constructions. Compound nouns in particular have received a great deal of attention in recent years due to the challenges they pose for natural language processing systems. One reason for this is that the semantic relation between the constituents of a compound is not explicitly expressed and must be retrieved from other sources of linguistic and world knowledge. I present a new scheme for the semantic annotation of compounds, describing in detail the motivation for the scheme and the development process. This scheme is applied to create an annotated dataset for use in compound interpretation experiments. The results of a dual-annotator experiment indicate that good agreement can be obtained with this scheme relative to previously reported results and also provide insights into the challenging nature of the annotation task. I describe two corpus-driven paradigms for comparing pairs of nouns: lexical similarity and relational similarity. Lexical similarity is based on comparing each constituent of a noun pair to the corresponding constituent of another pair. Relational similarity is based on comparing the contexts in which both constituents of a noun pair occur together with the corresponding contexts of another pair. Using the flexible framework of kernel methods, I develop techniques for implementing both similarity paradigms. A standard approach to lexical similarity represents words by their co-occurrence distributions. I describe a family of kernel functions that are designed for the classification of probability distributions. The appropriateness of these distributional kernels for semantic tasks is suggested by their close connection to proven measures of distributional lexical similarity. I demonstrate the effectiveness of the lexical similarity model by applying it to two classification tasks: compound noun interpretation and the 2007 SemEval task on classifying semantic relations between nominals. To implement relational similarity I use kernels on strings and sets of strings. I show that distributional set kernels based on a multinomial probability model can be computed many times more efficiently than previously proposed kernels, while still achieving equal or better performance. Relational similarity does not perform as well as lexical similarity in my experiments. However, combining the two models brings an improvement over either model alone and achieves state-of-the-art results on both the compound noun and SemEval Task 4 datasets. 3 4 Acknowledgments The past four years have been a never-boring mixture of learning and relearning, experiments that worked in …
منابع مشابه
Classification of Noun-Noun Compound Semantics in Dutch and Afrikaans
This article presents initial results on a supervised machine learning approach to determine the semantics of noun compounds in Dutch and Afrikaans. After a discussion of previous research on the topic, we present our annotation methods used to provide a training set of compounds with the appropriate semantic class. The support vector machine method used for this classification experiment utili...
متن کاملSemantic classification of Dutch noun-noun compounds A distributional semantics approach
This article describes the first attempt to semantically analyse Dutch noun-noun compounds using the distributional hypothesis, which states that the semantics of a word is implicitly represented by the words in its context. The purpose is not only to classify compounds based on their semantics. We also investigate in what circumstances this classification works best. Using Ó Séaghdha (2008) as...
متن کاملAnnotating and Learning Compound Noun Semantics
There is little consensus on a standard experimental design for the compound interpretation task. This paper introduces wellmotivated general desiderata for semantic annotation schemes, and describes such a scheme for in-context compound annotation accompanied by detailed publicly available guidelines. Classification experiments on an open-text dataset compare favourably with previously reporte...
متن کاملAnnotation Guidelines for Compound Noun Semantics
relation is one of the 10 relation labels defined in section 2. direction specifies the order of the constituent nouns in the chosen relation’s argument structure – in particular, direction will have the value 1 if the first noun in the compound (N1) fits in the first noun slot mentioned in the rule licensing the chosen relation, and will have value 2 if the second noun in the compound (N2) fit...
متن کاملNoun Compound Interpretation Using Paraphrasing Verbs: Feasibility Study
The paper addresses an important challenge for the automatic processing of English written text: understanding noun compounds’ semantics. Following Downing (1977) [1], we define noun compounds as sequences of nouns acting as a single noun, e.g., bee honey, apple cake, stem cell, etc. In our view, they are best characterised by the set of all possible paraphrasing verbs that can connect the targ...
متن کامل